Abstract
Background Red blood cell distribution width (RDW) is an easily accessible diagnostic test that is underutilized as a prognostic marker, despite substantial supporting evidence in cancer, cardiovascular diseases, and helathy individuals (Pilling et al. PLoS One 2028;13:e0203504). The association between increased RDW and incident acute myeloid leukemia (AML) has long been suggested in healthy volunteers (Abelson et al. Nature 2019;559:400). We looked into a similar phenomenon in patients with polycythemia vera (PV) or essential thrombocythemia (ET) in large discovery and validation cohorts.
Methods A machine learning (ML) approach was used to identify baseline clinical and laboratory parameters that were predictive of leukemic transformation (LT) and develop a predictive model in a large discovery cohort of PV and ET, obtained from electronic medical records of the Maccabi Healthcare Services (MHS; Tel-Aviv, Israel). Receiver operating characteristic (ROC) curve analysis was utilized to determine the optimal cutoff points for RDW and other continuous variables. Mayo Clinic databases (Rochester, MN, USA) for patients with PV and ET were utilized as validation cohorts. Identification of study patients in the latter was according to the International Consensus Classification (Arber et al. Blood 2022; 140:1200) while in the discovery cohort was based on presence of ICD-9 codes associated with documentation of disease-specific driver mutations or treatment records that are consistent. Statistical analyses were performed using the R statistical software 4.4.2 (November, 2024) (Foundation for Statistical Computing, Vienna, Austria) or JMP Pro 18.0.0 software (SAS Institute, Cary, NC, USA).
Results The discovery cohorts from Israel included 4,592 patients with PV (median age 63 yesrs; 53% males) and 5,968 with ET (median age 61 yesrs; males 40%). The validation cohorts from the Mayo Clinic included 633 PV (median 65 years; 51% males) and 634 ET (median 60 years; females 64%) patients that were informative for RDW. In the discovery cohorts, median follow-up (incident AML) was 6.1 years (N=186; 4.1%) in PV and 6.1 years (N=85; 1.4%) in ET. The corresponding values for the validation cohorts were 6.6 years (N=24; 3.8%) in PV and 7.0 years (N=22; 3.5%) in ET.
Discovery cohort analyses Covariate-adjusted Cox regression analysis was utilized to estimate associations between LT and several clinical variables followed by machine learning pipeline to predict LT in a given timeframe. Conversion from continuous to binary variables was accompliched by ROC analysis. Shapley Additive exPlanations (SHAP) application enabled selection of top-ranked features that included RDW >15.5 (HR 3.2, 1.1-9.5; p=0.03), age >60 years (HR 2.9, 1.4-5.9; p<0.01), and LDH >1.5 x UNL (HR 4.0 (1.6-10.0; p<0.01). A similar analysis in ET also identified RDW >15.5 (p<0.01), age >60 years (p=0.01) as well as male sex (p=0.03) as independent risk factors for LT. Subseqent risk models based on these risk factors were externally validated and enabled robust risk stratification with 10-20 year AUCs rangening from 0.75 to 0.81.
Validation cohorts analyses The indpendent predictive value of increased RDW observed in the discovery cohort was confirmed in the validation cohorts for both PV and ET and further gender-specified: female/male RDW ≥16.4/15.8 in PV (HR 7.9, 1.9-33.8) and ≥16.1/14.4 in ET (HR 5.1, 2.2-11.9). Also confirmed were the additional prognostic contributions of age ≥60 years (HR 3.8, 1.3-10.9) and LDH >1.5 x UNL (HR 3.9, 1.6-9.4), in PV, while no other clinical risk factor showed RDW-independent prognostic relevance in ET. The prognostic impact of RDW in ET was most apparent in females (p<0.01), as opposed to males (p=0.32) while the effect was equally significant in PV. In multivariable analysis that included previously recognized high risk mutations in PV (SRSF2) and ET (SF3B1), both RDW and mutations sustained significance (p<0.05).
Conclusion The current study identifies higher RDW as a key risk factor for LT in both PV and ET, independent of other clinical or genetic risk factors. Possible underlying mechanisms include increased RDW as a marker of long-term pre-leukemic clones with sub-clinical ineffective erythropoiesis or an associated pro-leukemic inflammatory state.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal